Emergent multilevel selection in a simple spatial model of the evolution of altruism

Theories on the evolutionary origins of altruistic behavior have a long history and have become a canonical part of the theory of evolution. Nevertheless, the mechanisms that allow altruism to appear and persist are still incompletely understood. It is well known, however, that the spatial structure of populations is an important determinant. In both theoretical and experimental studies, much attention has been devoted to populations that are subdivided into discrete groups. Such studies typically imposed the structure and dynamics of the groups by hand. Here, we instead present a simple individual-based model in which altruistic organisms spontaneously self-organize into spatially separated colonies that themselves reproduce by binary fission and hence behave as Darwinian entities in their own right. Using software to automatically track the rise and fall of colonies, we are able to apply formal theory on multilevel selection and thus quantify the within- and among-group dynamics. This reveals that individual colonies inevitably succumb to defectors in a within-colony “tragedy of the commons”. Even so, altruism persists in the population because more altruistic colonies reproduce more frequently and drive less altruistic ones to extinction. Evidently, the colonies promote the selection of altruism but in turn depend on altruism for their existence; the selection of altruism hence involves a kind of evolutionary bootstrapping. The emergence of the colonies also depends crucially on the length scales of motility, altruism, and competition. This reconfirms the general relevance of these scales for social evolution, but also stresses that their impact can only be understood fully in the light of the emergent eco-evolutionary spatial patterns. The results also suggest that emergent spatial population patterns can function as a starting point for transitions of individuality.


Introduction
Over the past decades, a rich body of theoretical research has been devoted to the evolution of social behaviors [1,2]. In particular, much theory has focused on the evolution of cooperation [3,4], and more narrowly, altruism [5,6]: behavior that is costly to the actor but beneficial to its interaction partners. Historically, how natural selection could favor altruism has been a puzzle, but in broad terms the solution has long been understood: altruism can be selected if its benefits accrue disproportionately to altruists, thus offsetting their costs [3,[7][8][9]. Nevertheless, the mechanisms that allow such an interaction structure to arise and persist are still a matter of intense study and debate [4].
Many classical studies considered populations that are subdivided into distinct groups (e.g., [10][11][12][13][14]). In such a group structure, altruistic behavior can be selected provided altruists tend to be grouped together and groups with a higher proportion of altruists tend to have higher mean fitness [11]. In nearly all theoretical models of group selection, the group structure and group-level dynamics are imposed or presupposed by the definition of the model. In contrast, we here present a very simple individual-based model in which altruistic organisms self-organize into discrete colonies. Moreover, these colonies themselves spontaneously reproduce by growth and binary fission and hence act as Darwinian entities in their own right. In time, each individual colony is fated to collapse; but when it does, another colony grows and divides, giving rise to the kind of multi-level dynamics that in most previous models had to be imposed by hand [14][15][16]. Such rudimentary, emergent higher-level entities could be a first step towards a full "transition of individuality" [17].
The model describes a spatial environment inhabited by motile organisms that reproduce and interact locally. This it has in common with a significant class of earlier models that use various modeling formalisms and have been analyzed using a variety of techniques (e.g., [15,[18][19][20][21][22]). In such models, evolutionary and ecological processes become intimately intertwined because the evolutionary processes are strongly affected by spatial self-structuring, while the self-structuring in turn depend on evolved traits.
Both group-structured and spatially explicit models have demonstrated that local interactions combined with local mating and reproduction can foster altruism if motility is limited, because this allows altruists to aggregate in assorted groups or neighborhoods where they mainly benefit each other [19,[23][24][25]. However, such models have also revealed important limitations [26][27][28][29]. If not only social interactions but also competitive interactions take place locally ("soft" selection [30,31]), altruists in assorted groups or domains tend to compete with other altruists, in which case the benefits of altruism may be largely or fully canceled by the concomitant increased competition. This local Malthusian trap is alleviated somewhat if the local carrying capacity increases with the proportion of altruists ("elastic" selection), which allows clusters of altruists to become net population sources [32]. Importantly, it can also be avoided if competitive interactions reach beyond the social group or neighborhood, so that altruists in assorted clusters can support each other at the expense of others [24,26,29,33]. This highlights the importance of the relative scales of motility, altruism, and competition [1]. As a rule, altruism is favored by limited motility and local social interactions, but global competition.
Long-range competition can come in many implicit forms. For instance, the life cycle of organisms may include a dispersal stage such that individuals can first cooperate with relatives and then compete with non-relatives [26]. Alternatively, the group-level dynamics may periodically include a global mixing stage in which groups or neighborhoods are fragmented and new ones are seeded [34][35][36]; or groups may occasionally go extinct, after which propagules derived from preexisting groups compete to recolonize their spot [16,37]. To study the effects of the scales of motility, altruism and competition systematically, the model presented here is deliberately designed such that these scales can be set explicitly and independently. As it turns out, their role is much more intricate than anticipated because they play an essential role in the emergence of the colonies and hence in the resulting multilevel eco-evolutionary dynamics.
To quantitatively analyze model simulations, we use software that automatically tracks the rise and fall of colonies. Subsequently, we apply existing formal theory to quantify the contributions to selection at the individual and colony levels. This demonstrates that, within colonies, natural selection favors defectors who profit from the altruists in their neighborhood but do not share in the costs. But colonies characterized by a higher average level of altruism survive longer and reproduce more frequently, resulting in positive selection at the colony level. The steady level of altruism that eventually establishes can be understood as a balance between these forces: a perpetual "tragedy of the commons" [38] within colonies, compensated by positive selection among them.

Brief description of the model
We start with a brief specification of the model; details are supplied in section Materials and methods.
The model considers a population of discrete individual in a two-dimensional (2D) or onedimensional (1D) habitat (see Fig 1A). Individuals possess just one continuous trait ϕ, representing their investment in altruistic behavior, and they do only three things: move, in an unbiased fashion modeled by diffusion; die, at a constant (Poisson) rate; and reproduce asexually.
The rate of reproduction of each individual depends on three quantities. First, it decreases with the individual's own investment in altruism: altruism is costly. A level of altruism of ϕ = 0.05 means that the individual sacrifices 5% of its reproduction rate relative to a defector (ϕ = 0) under the same conditions. Second, the reproduction rate also decreases with the population density in the individual's local neighborhood. This models competition for resources and establishes a finite carrying capacity. The local population density is measured as a Kernel Density Estimate (KDE), using a normal distribution with standard deviation σ rc as the kernel function. This means that individuals compete strongly with each other only if their spatial separation is of order σ rc or less (see Fig 1B, red line, and Fig 1C), so that σ rc can be interpreted as the scale of competition. Third, an individual's reproduction rate increases if altruists are present in its local neighborhood (Fig 1B, green line, and Fig 1D). The altruism experienced at a given position y, denoted A(y), is again quantified as a KDE, but now individuals are weighted in proportion with their level of altruism ϕ. Although the model is not intended to mimic a specific altruistic behavior or mechanism, it is convenient to think of A(y) as the concentration of some public good secreted by altruistic organisms. If more and more public good is added to the local neighborhood, the benefit eventually saturates. The standard deviation of the kernel function used to calculate A(y) is called σ a and generally differs from σ rc . Because individuals profit significantly from the public good produced by others only if their separation is of order σ a or less, σ a can be interpreted as the scale of altruism.
It is worth emphasizing that, contrary to some other models [20,22], complete defectors (with ϕ = 0) are perfectly viable; altruism is not required for the survival of the population.
When an individual reproduces, the offspring appears at the coordinates of the parent; afterwards, parent and offspring move independently and thus part ways. Offspring usually inherit the trait value of the parent, but with a small probability a mutation occurs that increases or decreases it at random.
In simulations, space and time are discretized, and periodic boundary conditions are imposed. Default parameter values are listed in Table 1. Throughout the text, the time unit is the inverse of the death rate, called the "generation time". Importantly, the scale of altruism σ a is used as the unit of length and hence σ a = 1 by definition. Thus, just two length scales remain: the scale of competition σ rc and the scale of motility, σ m . The latter is defined as the typical (that is, root-mean-square) distance traveled by an individual in a generation time (see section Materials and methods).

Emergent colonies and multilevel dynamics
The complex behavior of the simple model is illustrated in Fig 2, which presents results of a single simulation run using a 2D habitat. These results are representative for the default parameters (see replicates in S1 Fig) but the parameters themselves have been chosen deliberately to enable the evolution of altruism. In particular, motility is slow and the scale of competition σ rc is four times larger than the scale of altruism σ a .
As shown in Fig 2A, all individuals are initialized as defectors, but in time the mean level of altruism steadily increases (thick colored line) before reaching a plateau. To confirm that this rise is largely due to natural selection rather than random drift or mutational bias, we measured the cumulative contribution of natural selection (black line), which is consistently positive (also see S1 Fig and S1 Text Section 3). The surprising spatial dynamics of the simulation are visualized in the snapshots of Fig 2B and, more vividly, in the S1-S3 Videos. While individuals are initially distributed uniformly at random, they spontaneously organize into dense colonies surrounded by "exclusion zones". These colonies subsequently organize into a hexagonal pattern; to illustrate this, a hexagonal grid is overlaid in the right-most panel of Fig 2B. To further characterize the pattern we determined the radial distribution function, which is defined as the distribution of distances between all pairs of individuals, normalized by the random expectation ( Fig 2C). The longranged oscillations in this distribution reveal a lattice constant of a � 8.4, consistent with estimates based on the number of colonies found in the habitat (see section Materials and methods). The mechanism producing the pattern is analogous to that of the famous Turing patterns in reaction-diffusion systems [39], as we will discuss in some detail below and in section Materials and Methods.
But the pattern is not static. In Fig 2D, enlargements are shown of a small region of the habitat. Consider the colony marked by the red circle. Initially, the colony is mostly blue, indicating that most individuals in this colony are highly altruistic. In time, however, the color degrades from blue to brown, reflecting a decline in altruism, and eventually the colony goes extinct. As best seen in the S1-S3 Videos, this fate is bestowed on many colonies in the simulation. This suggests that altruistic colonies are sensitive to corruption by defectors that occasionally appear by mutation or invasion from neighboring colonies, resulting in a withincolony "tragedy of the commons" [38]. In the same figure, however, green arrows point to what happens after a colony disappears: a different colony nearby initially grows in size and then spontaneously divides in two, locally restoring the hexagonal pattern. Daughter colonies inherit their over-all color from their parent colony. Importantly, it appears in the S1-S3 Videos that colonies with a high mean level of altruism divide particularly rapidly and thus manage to multiply and spread.
All in all, these observations suggest that the colonies themselves behave like Darwinian entities: they reproduce by binary fission and show heritable variation in their level of altruism. Moreover, in view of the tragedy of the commons seen within colonies, the colony-level dynamics appear crucial for the evolution of altruism.

Colony formation is crucial for the evolution of altruism
So far, the evidence presented on the dynamics of colonies has been anecdotal and qualitative.
To further study the behavior of the model and obtain quantitative results, we now shift to a one-dimensional habitat. Simulations with a one-dimensional habitat are considerably faster, allowing many parameter settings to be explored, and are analyzed more readily, both mathematically and computationally. Automated multilevel lineage tracking. Qualitatively, the behavior of the 1D model is analogous to that of the 2D model. Fig 3A shows a section of the space-time arena for a simulation with default parameters (see Table 1). The left-hand side of the figure (gray scale) presents the population density. The striped pattern clearly reveals the formation of regularly spaced colonies that can persist for thousands of generations. An algorithm was used to detect these colonies automatically and track them in time (see section Materials and methods). The righthand side of the figure plots the center of mass of the tracked colonies; colors represent the  Table 1). (A) Mean level of altruism versus time (thick colored line) as well as the cumulative contribution of natural selection (black), which is consistently positive (see S1 Text, section 3). (B) Snapshots of the simulation habitat; also see the S1-S3 Videos. In time, the population self-organizes into a hexagonal pattern of discrete colonies. A section of a hexagonal grid is superimposed in the right-most panel. (C) The hexagonal pattern is also apparent from the radial distribution function at t = 8000, the distribution of distances between pairs of individuals normalized by the random expectation. Black arrows indicate the distances occurring in an exact hexagonal grid with grid constant a = 8.4 and their relative frequency. (D) Enlargements of a small domain of the habitat, showing that the colonies behave like Darwinian entities: they disappear as a result of a within-colony tragedy of the commons [38] (red circle and cross), and reproduce by binary fission (green arrows).
https://doi.org/10.1371/journal.pcbi.1010612.g002 mean level of altruism of the individuals populating the colonies. In the middle part of the figure, density and traces overlap to showcase their consistency. Some traces suddenly end, indicating that the colony went extinct. Such events are detected automatically and indicated with a black square. From the figure, it is apparent that prior to the death of a colony the mean level of altruism always declines, suggesting a tragedy of the commons. In other places, traces suddenly fork, which is also automatically marked, this time with orange circles. Clearly, the colonies in the 1D habitat reproduce by binary fission (like their 2D counterparts); the daughter colonies inherit their color from their parent. Again it appears that more altruistic colonies divide more frequently.
Colonies emerge due to a linear instability. The mechanisms behind the emergence of colonies in the 1D habitat can be studied mathematically using linear stability analysis (LSA). We envision a population of individuals with a fixed level of altruism ϕ homogeneously distributed over a large habitat that is populated at carrying capacity. Next we superimpose a tiny periodic density variation with some wavelength λ and derive under which conditions this perturbation is expected to grow exponentially, resulting in "colonies". This also allows us to make approximate predictions on the wavelength of the emerging pattern, i.e., the distance between colonies. Details are found in section Materials and methods and S2 Fig. The LSA reveals that colonies are expected to develop only for certain combinations of the scales of altruism, competition, and motility ( Fig 3C; remember that σ a = 1 by definition). As the LSA elegantly demonstrates, the appearance of colonies is determined by a tug of war between these three forces. Altruism by itself tends to amplify differences in local density: areas with a high density contain more altruists, which positively affects the reproduction rate and hence further increases the density. This ultimately drives the emergence of colonies. However, this force is weak for density variations with a wavelength shorter than � σ a , which average out within the scale of altruism. Resource competition tends to quench density differences because it suppresses reproduction in densely population areas. This force, however, is weak for variations with wavelengths shorter than � σ rc , which are averaged out within the scale of competition. Lastly, random motility also tends to homogenize the density, but this force is ineffective against variations with wavelengths larger than � σ m because random motion is famously slow at large scales. Together, this means that colonies form only if σ m is small compared to the other scales and σ a is clearly smaller than σ rc , so that wavelengths exist that are too long to be quenched by motility, long enough to be amplified by altruism, but too short to be suppressed by resource competition (see S2(A) Fig).
To test these predictions, Fig  Colony formation by altruists enables evolutionary bootstrapping. From the observations of both the 1D and the 2D model it appeared that the emergence of colonies is important for the evolution of altruism. If so, appreciable levels of altruism should evolve only in the parameter regime where colonies can emerge given reasonable levels of altruism (the linearly unstable, yellow region of Fig 3B and 3C). This is confirmed by a series of simulations for various scales of motility and competition ( Fig 3D). Because the colony formation depends on altruism, but the selection of altruism in turn depends on the formation of colonies, the process must pull itself up by the bootstraps. Random mutations plus local reproduction spontaneously result in unstable colonies with modest levels of altruism and high internal levels of drift. This occasionally produces a colony that is altruistic enough to reproduce, which starts to spread rapidly.
Factors other than the spatial length scales clearly also affect whether altruism prevails. If the scale of competition becomes too large relative to the scale of motility, the mean level of altruism suffers (Fig 3D, e.g. at σ rc = 5 and σ m = 0.1). Also, the stability of colonies against corruption by defectors is affected by the rate with which such defectors are created by mutations. In line with this, the mean level of altruism decreases if the mutation probability is increased ( Fig 3E). That said, altruism emerges for a broad range of mutation probabilities.

Colonies die as a consequence of competition
Above, we stressed that even completely selfish individuals with ϕ = 0 are viable in isolation. In this light, the observation that colonies characterized by a low mean level of altruism tend to go extinct demands an explanation. In the model, only two factors can suppress the fitness of individuals: the cost of altruism, and competition. The cost of altruism cannot explain the extinctions because it tends to be low rather than high in colonies with low levels of altruism. By elimination, the extinctions must result from competition with neighboring colonies.
Key is the observation that colonies with a high level of altruism develop a higher local population density because the common good allows an increased reproduction rate. As a consequence, a mostly selfish colony that finds itself next to a particularly altruistic one experiences increased competition. This increased competition suppresses its population number. In response, the altruistic colony in fact experiences less competition, allowing it to expand its population number and to move closer to its neighbor. The latter phenomenon can consistently be seen in Fig 3A: briefly before a colony goes extinct, neighboring altruistic colonies tend to encroach on it. Thus, sufficiently altruistic colonies can drive sufficiently selfish ones to low population numbers and ultimately extinction.
When a colony is about to disappear, sometimes a few of its surviving members are absorbed by the colony that takes its place. Now infected with defectors, that colony often rapidly declines and ultimately disappears as well. Under default parameters this process does not allow lineages of defectors to coexist indefinitely with altruists: in simulations in which no new mutations are introduced after the first 80 000 generations, defectors quickly disappear (see S3 Fig).

Quantitative measurement of multilevel selection components
To formally analyze and quantify the role of the colony dynamics in the selection of altruism, we make use of two existing mathematical results. Both are based on subtly different formalizations of the concept of group selection that are sometimes referred to as multilevel selection (MLS) 1 and 2 [13,40] (see S1 Text for brief derivations).
The two results rely on different applications of the Price equation [41][42][43]. The Price equation decomposes the change in the population mean of a trait ϕ over a time interval Δt into two parts: the selection differential S, which quantifies the contribution of natural selection, and the transmission term T, reflecting systematic differences between the trait value of ancestors and their offspring. MLS 1 is based on the fact that, in a population that is subdivided into groups, the selection differential S can be split into two components: S = S within + S among . Here S within is the (weighted) mean of the selection differential measured within groups, while S among is the covariance between a group's mean trait value and its mean fitness, which can be interpreted as the selection among groups.
Our observations suggested that selection within colonies tends to be negative, which is to say that S within is systematically negative. To compensate, S among would have to be positive as a rule. To test this, we split the simulation illustrated in Fig 3A into 2000 time intervals of 80 generations each and calculated S within and S among for each of these time intervals. The results, plotted in Fig 4A, confirm the expectations. Over the last 1000 time intervals (shaded background in Fig 4), the mean level of altruism no longer changed significantly. (Mean change per time interval is (0.9 ± 2.0) × 10 −5 , where the uncertainty denotes a 95% confidence interval; see section Materials and methods.) But over the same period, the within-colony component of selection was negative during 97.6% of the time intervals, averaging (−3.8 ± 0.4) × 10 −4 . In contrast, the among-colony component was positive in 98.7% of the time intervals, with a mean of (4.2 ± 0.4) × 10 −4 . Hence, selection within colonies is indeed negative (reflecting the withincolony tragedy of the commons), but this is compensated by a positive among-colony component of selection.
The analysis of MLS 1 applies the Price equation to the population of individuals. MLS 2 instead applies it to the population of colonies. The mean level of altruism of individuals in a colony, F, is now considered a trait of that colony, and the fitness of a colony is defined as the number of offspring colonies that it has at the end of the time interval. The Price equation can then be used to describe the change in the mean level of altruism of the colonies. The selection differential S now measures whether colonies with a high value of F tend to produce more offspring colonies, and hence can be interpreted as the colony-level selection on F. The transmission term T now quantifies to what extent the F-value of offspring colonies systematically differs from those of their ancestral colonies. Hence, T characterizes the internal evolution of colonies. (Generally this internal evolution cannot directly be interpreted as internal selection, as other evolutionary forces may contribute to it.) It appeared that colonies with a higher mean level of altruism tended to have a higher fitness, suggesting that the colony-level selection is predominantly positive. To test this, we applied the MLS 2 framework to the same 2000 time intervals used for the MLS 1 analysis, making use of the automatically acquired colony-level lineage traces to measure the fitness values of the colonies. The result, plotted in Fig 4B, again confirms the expectations. Over the last  1) is to mathematically split the natural selection measured acting on the organisms into two parts: selection within and selection among colonies [42]. (See S1 Text Section 4.) For the simulation shown in Fig 3A, this calculation was done for each subsequent interval of 80 generations (see section Materials and methods). Plotted is the change in the mean level of altruism (black), and the within-colony (red) and among-colony (blue) components of selection. The rotated histograms on the right-hand side show distributions based on second half of the simulation, indicated with the gray background. In this part of the simulation, the population mean level of altruism no longer changed systematically. However, within-colony selection is nearly always negative, compensated by positive among-colony selection. (C) The second approach (MLS 2) describes evolution at the level of the colonies. (See S1 Text Section 5.) Evolution taking place within colonies then appears as a transmission bias: a bias in the change between ancestor and offspring colonies in the colony mean level of altruism. This transmission bias (red) tends to be negative, but is compensated by positive selection at the colony level.
https://doi.org/10.1371/journal.pcbi.1010612.g004 1000 time intervals, the mean level of altruism of colonies � F no longer changed significantly. (Mean change per time interval: (0.9 ± 2.0) × 10 −5 .) In the same window, colony-level selection was positive in 82.2% of the time intervals, and negative in only 1.6%. (In the remaining intervals none of the colonies reproduced or died, resulting in a colony-level selection of precisely 0.) Its mean value was (4.7 ± 0.5) × 10 −4 . This was compensated by the colony-level transmission term, which was negative in 97.5% of the intervals, with an average of (−4.6 ± 0.4) × 10 −4 . From this we conclude that the mean level of altruism of individual colonies tends to decrease with time, compensated by an increased fitness of colonies with a higher level of altruism.
We suspected that altruistic colonies are fitter for two reasons: they are more likely to reproduce but also less likely to die. Mathematically, the relative importance of these two mechanisms can be determined by splitting the colony-level selection of MLS 2 into two terms measuring the contribution of each; see S1 Text Section 6 for a derivation. We applied this analysis to simulations. As seen in S5 Fig, both terms are overwhelmingly positive, confirming that both mechanisms contribute. That said, the fact that colonies with a higher level of altruism are less likely to die is more consequential by a factor of two.

Discussion
Above, we have presented a simple model of the evolution of altruism. Despite its simplicity, the model displays complex dynamics. Under suitable parameter settings, a linear instability permits a process of evolutionary bootstrapping in which altruistic colonies emerge that themselves reproduce by binary fission. Quantitative measurements demonstrated that defectors have the upper hand within colonies, but that colonies with a higher mean level of altruism reproduce more frequently and survive longer. The net effect is that a significant level of altruism develops and persists in the population.
Complex biological systems invariably show a hierarchical organization, with collectives of individuals at one level forming entities at a higher level. The evolution of such hierarchical structures involves transitions in which collectives of individuals start to behave as individuals: units of evolution [44] showing group adaptation [45]. An important open question in evolutionary theory is what mechanisms and conditions allow such transitions in individuality to take place [46]. While many theoretical models study evolution in hierarchical population structures, most take this structure as given (e.g., [14,47]) and hence do not address the evolution of the hierarchical structure as such. In rare cases, particular parameters of the group structure are allowed to evolve concurrently with the social behavior (e.g., [48]), but even then the existence and nature of the group-level life cycle is still presupposed. In other models the hierarchical structure is spontaneously produced by the spatial dynamics, but the aggregates cannot naturally be considered Darwinian entities because they are either too short-lived or do not replicate in a clear-cut sense (e.g., [49][50][51]). In yet other models, the formation of collectives is initially "scaffolded" by preexisting environmental structure [46]. In this light, a distinguishing feature of the current model is that group-level Darwinian entities emerge spontaneously by self-organization; to our knowledge, few other models have this property (but see [52]).
The formation of spatial density patterns is very common in nature and can result from various mechanisms [53]. The model that we presented reconfirms that, even in the absence of a preexisting ecological scaffold, such ecological self-organization can naturally result in competition and replication at the level of aggregates. The aggregates emerging in our model are subject to considerable internal evolution; hence, although most would acknowledge them as units of selection [54], they do not qualify as units of evolution in the sense of Maynard Smith [44,55] or as a level of adaptation in the sense of Gardner & Grafen [45]. Possibly, however, once natural selection is able to act at the level of such emergent aggregates, this opens an avenue towards a full transition of individuality.
Classical theory argues that the relative scales associated with motility, social interactions and competition are of crucial importance for the evolution of altruism (see Introduction).
The results of the model confirm this: as expected, altruism evolves only if motility is limited and the scale of competition is larger than the scale of altruism. The significance of the scales, however, is much more involved than anticipated because they largely determine the emerging ecological patterns, which in turn shape the evolutionary dynamics. Indeed, the evolutionary dynamics support altruism (Fig 3D) only if the ecological dynamics support the formation of colonies (Fig 3C). This emphasizes that it is unlikely that the eco-evolutionary behaviors of complex dynamical models can be summarized by generic rules of thumb.
The formation of new colonies through the fission or budding of existing ones is in fact common in social mammals and insects, and its relevance for the evolution of cooperation and altruism has been demonstrated in earlier models (e.g., [16,[56][57][58]). In their formulation, these earlier models tend to differ strongly from the one presented here. They explicitly specify the dynamics of group-level events, such as group extinctions and recolonization by buds or propagules, as well as the effect of social behaviors on these dynamics; in addition, they usually do not explicitly include spatial structure within or among groups. Nevertheless, the emergent dynamics of our model are qualitatively reminiscent of such models. An important lesson drawn from earlier models is that altruistic traits can evolve if they decrease the rate of colony extinctions, increase the recolonization rate of propagules, or permit an increase in colony size [16]. As we have seen, in our model the altruism achieves all three effects to some extent. Superficially, the model is also reminiscent of the ecological public-good (EPG) games of Wakano et al [22], which also produce intricate spatial patterns, including Turing patterns. But upon closer inspection the two models differ fundamentally in multiple aspects. In the EPG model all interactions are entirely local. Its pattern formation depends on a coexistence equilibrium point that has no counterpart in the current model, and parameters are chosen such that defectors are not viable without altruists. In addition, a necessary condition for the Turing patterns of the EPG model is that defectors are more motile than altruists; this distinction does not exist in our model. Importantly, the authors do not report that their colonies replicate, although perhaps such behavior could be obtained in a particular parameter regime. To sum up, the mechanisms producing the spatial patterns qualitatively differ between the two models and the EPG model does not display similar multilevel dynamics.
The concepts of multilevel and group selection and their relation to inclusive fitness theory are the subject of a longstanding and fierce debate [59][60][61][62][63]. Here, we do not engage in this debate. Given the remarkable colony dynamics of the model, a multilevel perspective is particularly natural and allowed us to test relevant hypotheses. But several other theoretical frameworks and fitness-accounting schemes [9] could be applied as well to address different questions. To illustrate this, S6 Fig presents an inclusive-fitness analysis of one of the 1D simulations with default parameters. The analysis confirms that altruism is individually costly but benefits interaction partners, who tend to be highly related.
In the literature, multiple conceptualizations of multilevel selection exist, including MLS 1, MLS 2, and Contextual Analysis [40,64]. As to which of these methods best captures the concept of multilevel selection no consensus has yet been reached [13]. Above, we have used the decompositions of MLS 1 and 2 essentially as alternative descriptive statistics, each measuring different but well-defined properties of the system. For example, the colony-level selection term of MLS 2 confirms that more altruistic colonies have more offspring, irrespective of whether one considers this term a proper (or even the sole) measure of the concept of group selection. Similarly, the within-group contribution to selection of MLS 1 is a useful measure to test the hypothesis that selection within colonies is negative on average. We could have applied Contextual Analysis too, to confirm associations between individual or contextual properties and fitness [31,40]. Their conceptual caveats notwithstanding, each of these methods provides a different vantage point and potentially new insights. The features of our model make it ideally suited to illustrate and test the various approaches to multilevel selection.
Models of evolution in subdivided populations usually assume that the fitness of individuals depends solely on their own traits and those of other group members. Any spatial structure at the sub-colony scale is thus ignored. Moreover, it is typically assumed that groups compete equally among each other, ignoring spatial structure at scales beyond the size of a group. In applications, these assumption are approximations at best, and they certainly do not hold in our model because colonies are not homogeneous and the colony-level dynamics clearly result in spatial assortment of colonies (see Figs 2B and 3A). This does not invalidate MLS theory, but serves to remind us that we cannot expect to obtain a complete picture from a single vantage point. In a forthcoming article we describe a complementary multiscale approach that allows natural selection to be decomposed into contributions at each spatial scale [65]. This approach can be used to analyze the importance of structures below and beyond the colony level; it is also applicable to models that generate spatial structures such as spirals and waves that are relevant to selection but perhaps too ephemeral to be conceptualized as groups. More generally, over the years many evolutionary concepts have been formalized mathematically [66], but these results are rarely applied to computational and individual-based models. Despite each formalism's limitations, together they provide a valuable toolbox that allows models to be scrutinized quantitatively from multiple perspectives [40,67]. We hope that future studies take full advantage of its potential.

Materials and methods
Detailed description of the model Definition of the model. We envision a population or individuals living in a large habitat, which can be one-or two-dimensional. Each individual is fully characterized by its spatial coordinates plus the value of a single quantitative trait ϕ, which indicates its investment in altruistic behavior. The behavior of the individuals is defined by just four stochastic processes: death, motility, reproduction, and heredity with mutation.
Death strikes each individual at a fixed (Poisson) rate d; if an individual dies, it disappears from the population. The average lifespan of an individual is d −1 , which we call the generation time.
Motility is modeled as unbiased diffusion with diffusion constant k D . It follows that, in a generation time, the root-mean-square displacement of an individual in each spatial dimension is s m � ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 2k D =d p , the scale of motility. It can be interpreted as the "typical" distance traveled by an individual during its lifetime. Note that we ignore that individuals take up space: nothing prevents multiple individuals from being at the same position at the same time.
Reproduction is asexual. When an individual reproduces, a new individual is placed at the same position as the parent. The rate of reproduction of each individual is negatively affected by the level of altruism of the individual itself and by competition for resources with other individuals; in contrast, it is positively affected by the altruism of others in its local environment. To implement these effects mathematically, we make use of two quantities that we will now introduce.
First, we define the local population density D(y; σ rc ) at position y as a conventional Kernel Density Estimate (KDE): Here, the summation runs over all individuals i in the population; x i is the position of individual i; and the kernel function G rc (x; σ rc ) is the Gaussian (normal) distribution (univariate or bivariate, depending on the dimensionality of the habitat) with standard deviation σ rc . By this definition, the population density at position y is high if many individuals are found within a distance of order σ rc from y. The parameter σ rc is called the scale of competition because, as explained below, it determines the range of competitive interactions. Second, the altruism experienced by an individual at position y is measured as This is again a KDE, except that each individual i is weighted by its level of altruism ϕ i . It is convenient to think of A(y; σ a ) as the availability of some public good that organisms secrete locally in proportion to their level of altruism. The summation in Eq 2 runs over all individuals, and the contribution of each individual to the public good at position y decreases with their distance to y according to a Gaussian kernel function G a . The standard deviation σ a of the kernel function is referred to as the scale of altruism because it determines the range of altruistic interactions. In terms of these definitions, the full equation for the reproduction rate g i of individual i reads Here, g 0 is the basal reproduction rate. In the subsequent factor (labeled "factor 1"), the term −cϕ i implements a deficit in the reproduction rate owing to the individual's investment in altruism; the parameter c determines the price of altruism. The last term in factor 1 implements the advantage obtained from the altruism of others. The advantage grows as b 0 A(x; σ a ) if A(x; σ a ) is small but saturates at b max if A(x; σ a ) is large. Factor 2 introduces resource competition: it decreases linearly with the local population density D(x; σ rc ) such that reproduction is locally inhibited when the density approaches K. (In practice, the population density stabilizes somewhat below K, where the average reproductive rate equals the death rate d.) The max [.] function is required because both factors 1 and 2 could in rare cases become negative; in that case, g i is set to 0.
Heredity and mutation are implemented as follows. Upon reproduction, the offspring usually inherits the value ϕ of the parent, but with probability μ a mutation occurs. In that case, the ϕ-value of the offspring is determined by adding a random change δϕ to the value of the parent. The absolute value of δϕ is drawn from an exponential distribution with mean m, and its sign is positive or negative with equal probability. A concern with this procedure, however, is that the resulting trait value ϕ of the offspring can become negative. In simulations with a 2D habitat this was not permitted and in such events the value was instead set to 0. Although this is a natural choice, it introduces a mutational bias (see S1 Fig), which would complicate some of the analyses performed on the 1D version of the model (in particular, Fig 3D and 3E). In the simulations of the 1D system ϕ was therefore allowed to become negative, but the behavior of the individuals was determined by the "effective" value ϕ E = max[ϕ, 0] rather than by ϕ itself. In Fig 3D and 3E the mean of ϕ E is plotted. In Fig 3A the colony mean of ϕ is plotted, but the distinction is immaterial because in this window of the simulation negative values of ϕ are rare.
Units and parameter reduction. We are free to choose convenient units for length, time, and the trait ϕ; thus, three parameters can be eliminated. First, we choose the unit of length such that the scale of altruism σ a equals 1 by definition. The two other length scales that exist in the model, σ rc and σ m , are therefore expressed relative to σ a . Second, units of time are chosen such that the generation time d −1 is 1. This implies that the death rate d also equals 1 by definition. Third, the unit of the trait value ϕ is chosen such that the parameter c (see Eq 3) equals 1. This simplifies the interpretation of ϕ: an individuals with trait value ϕ directly sacrifices a fraction ϕ of its basal reproductive rate to the public good. Note, however, that the summation in Eq 2 runs over all individuals, so that each individual also benefits from its own altruism. In the literature, a distinction is sometimes made between soft and hard altruism, depending on whether the direct benefits that altruists reap from their behavior outweigh the direct costs [47]. We always choose parameters such that the direct costs far exceed the direct benefits, modeling hard altruism. The contribution of individual i to the public good A(x i ; σ a ) at its own position x i is given by � i 2ps 2 a in the 2D habitat: From Eq 3 and the fact that σ a = 1 by definition, it then follows that the reproductive advantage due to one's own altruism is bounded by b 0 � i = ffi ffi ffi ffi ffiffi 2p p (in the 1D habitat) or b 0 ϕ i /(2π) (in the 2D habitat). This reproductive advantage cannot outweigh the deficit of cϕ i = ϕ i unless b 0 > ffi ffi ffi ffi ffiffi 2p p (in the 1D case) or b 0 > 2π (in the 2D case); we steer clear of this regime by choosing b 0 appropriately small.

Implementation of the simulations
Simulation scheme. In the simulations, continuous space is approximated by a linear grid (in the 1D habitat) or a square grid (in the 2D habitat) with grid cells of linear size δx. Periodic boundary conditions are imposed. Time is divided into computational time steps δt.
During each computational time step, the state of the system at time t + δt is constructed based on the state at time t by the following sequence of steps: Step 1. Calculate reproduction rates First, the density D(x; σ rc ) and the availability of public good A(x; σ a ) at each position x are computed, taking into account the periodic boundary conditions. After this, the reproduction rates g i of all individuals can be calculated.
Step 2. Reproduction and mutation Each individual i in the field reproduces with probability g i δt. The offspring is mutated with probability μ, as described above.
Step 3. Death Each individual subsequently dies with probability d δt.
Step 4. Motility Each individual is displaced in each spatial dimension by a distance drawn at random from a discrete approximation of a Gaussian distribution with mean 0 and standard deviation ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 2k D dt p .
Initial conditions. The steady-state population density of a population of defectors (ϕ i = 0) is approximately (1 − d/g 0 )K, which can be derived by solving g i = d under the assumption of a homogeneous population distribution. Therefore, the initial condition was constructed by placing (1 − d/g 0 )KL (in the 1D habitat) or (1 − d/g 0 )KL 2 (in the 2D habitat) defectors at uniformly random positions, where L is the linear size of the habitat.
Default parameters.
The default values of model parameters are listed in Table 1. Here, we also provide the computational parameters used to simulate the model. Additional settings were used to calibrate the automated recognition of colonies; see below.

Computational procedures
Calculating cumulative effects of selection, drift, and mutational bias. The mathematical framework used to quantify selection, drift and mutational bias is described in S1 Text Section 3. We applied this calculation to each time step of the simulations, so that the cumulative effect of each of the evolutionary forces could be tracked (Fig 2A and S1 Fig).
For the analysis we need to obtain, for each individual i present right before the computational time step, the expectation value EðW i Þ of the number of offspring W i it will have after the time step (also counting the individual itself if it survives). To do so, first the growth rate g i was calculated and subsequently the reproduction and death probabilities P r = g i δt and P d = d δt over this time step. From the simulation scheme (see above) the expectation value can then be derived: This expression is used in the calculations. We note that this expectation value is conditioned on the current state of the simulation, in particular the population density D(x i ) and availability of public good A(x i ) at the position of the individual. In other words, only the effects of the inherent randomness of reproduction and death given the state of the local neighborhood are accounted as random drift; the fact that the state of the local neighborhood itself is also affected by random events in the past, such as the stochastic motility and demographics of others, is not. (See also S1 Text Section 3.) Calculating the local population density. To efficiently calculate the local population density (Kernel Density Estimate or KDE) D(x; σ rc ) (Eq 1), first a matrix was constructed that specifies, for each position in the habitat, the number of individuals at that position. The KDE at each position, taking into account the periodic boundary conditions, is the circular convolution of this occupancy matrix with the periodic summation of the (discretized approximation of the) Gaussian kernel G rc (x;σ rc ). To perform this convolution, we use the Circular Convolution Theorem, which states that the circular convolution of two matrices can be obtained by first calculating their Discrete Fourier Transform (DFT) and then calculating the inverse DFT of their element-wise product.
Calculating the availability of public good. The public good available at each position, A(x; σ a ), is calculated in a similar way. First a matrix is constructed that contains, for each position in the habitat, the sum of the trait values of all individuals present at that position. The value of A(x; σ a ) for each position is now obtained as the convolution of this matrix with the (discretized approximation of the) Gaussian kernel G a (x; σ a ), again using the Circular Convolution Theorem.
Calculating the radial distribution function. The radial distribution function or paircorrelation function g(r) is defined as the observed number of pairs of individuals separated by a distance r, relative to the expected number under the null model assuming that each individual is placed at a random position.
In the 1D case, the distance r can only take on values kδx, where k is a non-negative integer. Call the population size n and the size of the habitat in grid cells X. To calculate the expected number of pairs at distance r = kδx, written as E(r), we note that the number of individuals o x at position x is binomially distributed under the null model. Its expectation value is E½o x � ¼ n=X and hence EðrÞ ¼ P XÀ 1 x¼0 E½o x o ðxþk mod nÞ � � n 2 =X. (Here, we used that the occupancies of different sites are to good approximation independent.) The observed number of pairs of individuals found at a distance r = kδx, called O(r), is precisely given by the auto-correlation of the occupancy matrix, which is again efficiently calculated using the Circular Convolution Theorem. For each value r = kδx, the radial distribution function is then obtained as g(r) = O(r)/E(r).
In the 2D case, the rectangular grid imposes that r can only take on values such that r 2 = (a 2 + b 2 )δx 2 , where a and b are integers. In addition, in calculating the expectation under the null model, the frequency F(r) with which each distance occurs in the grid has to be taken into account. (E.g., the distance 5δx occurs three times more often than the distance 6δx.) Under the same assumptions as made for the 1D case, the expectation is E(r) = F(r)n 2 /X 2 . To calculate the observed number of pairs O(r), the auto-correlation matrix of the occupancy matrix is used. Then g(r) = O(r)/E(r) is calculated for each admissible value of r. To obtain the plot in Fig 2C, the distances were subsequently binned.
Calculating the terms of MLS 1 and MLS 2. To obtain Fig 4 and S4 Fig, we divided the simulation into 2 000 time intervals of Δt = 80 generations and applied the analyses of MLS 1 and 2 to each time interval. The mathematical expressions for MLS 1 and 2 are briefly summarized in S1 Text Section 4 and Section 5. Here follows a description of the computational methods used.
For concreteness, let us focus on a particular interval (t 1 , t 2 ]. The first step is to calculate the selection differential S, defined as the covariance of ϕ and relative fitness w (Eq 2 in S1 Text Section 2). For the purpose of this analysis, the relative fitness w i of an individual i living at time t 1 is the number of offspring it has at time t 2 (the absolute fitness W i ), divided by the population mean W . To find these offspring numbers, each individual at time t 1 was assigned a unique ID that was subsequently inherited by all offspring. At time t 2 , a frequency table of ID values was constructed, which directly provided the fitness of each each individual at time t 1 . With this information, S can be calculated directly.
The analysis of MLS 1 splits S into two parts (see Eq 8 in S1 Text Section 4). It is sufficient to calculate the second term, S among , after which the first follows as S within = S − S among . To calculate S among first the geographical borders of all colonies at time t 1 were identified; the algorithm used for this is described in the next section. Next, each individual at t 1 was assigned to a colony. Then for each colony j we calculated its population size n j , its mean relative fitness {w; j} w and its mean trait value {ϕ; j} w . At this point, S among could be calculated from its definition.
The analysis of MLS 2 describes the dynamics from the perspective of the colonies (see Eq 9 in S1 Text). To determine the fitness of the colonies present at time t 1 , we had to count how many offspring colonies they have at time t 2 . This requires that we defined the borders between the colonies at time t 2 , but also that we traced the ancestor colony at t 1 for each offspring colony at t 2 ; the algorithm used is described in the next section. The other ingredient of Eq 9 in S1 Text is the trait value F of the colonies. Because F j = {ϕ; j} w (see S1 Text Section 5), these quantities were already calculated for MLS 1 and both terms of Eq 9 in S1 Text can be evaluated directly.
Automated recognition and tracking of colonies. To perform the multilevel selection analysis, the simulation had to automatically recognize colonies and track their ancestry. Because existing clustering algorithms are inefficient for 1D systems and/or difficult to adjust to our needs, we used our own heuristics.
Where to draw the border between neighboring colonies, and when to conclude that one colony has divided into two, is to some degree arbitrary. The results of the analyses, however, do not depend sensitively on such details as long as we use reasonable definitions and apply them consistently.
The basic idea is to identify the borders between colonies with local minima of the population density. However, local minima can also occur temporarily within colonies due to random fluctuations, and such minima should not be confused with true borders between colonies. To solve this, one might exclude local minima if the density at their position exceeds a set threshold, so that only "deep" minima are considered. Such a simple threshold rule can identify most colonies correctly, but issues arise during the binary fission of colonies. During this process the depth of the local minimum that separates the two daughter colonies fluctuates, and hence it is likely to cross the threshold multiple times. Consequently, the threshold rule tends to record multiple events of fission and fusion during a single process of colony division. Similarly, when a dwindling colony is about to disappear, the threshold rule tends to infer series of deaths and resurrections of the same colony.
To prevent this, the algorithm that was used in the simulations in fact uses two density thresholds: a low and a high one, T low and T high . When a new local minimum appears, a new border (and hence the birth of a new colony) is inferred only when the local density at the minimum drops below T low . In contrast, when an existing border is about to disappear, this is acknowledged only when the density at the associated local minimum rises above T high . The result is a hysteresis of sorts: when the density of a new minimum drops below T low for the first time, a colony is born; if afterwards the density at the border temporarily exceeds T low , the border is maintained unless it also exceeds T high .
To be precise, when performing the MLS analysis on the interval (t 1 , t 2 ], the borders of the colonies at time t 2 were constructed as follows: 1. Calculate a smoothed density. The smoothed density at each position was defined as a KDE with bandwidth σ a /2, taking into account the periodic boundary conditions. 2. Identify local minima. If the smoothed density at grid point x is written as ρ x , each x such that ρ x < ρ x+1 and ρ x < ρ x−1 marks a local minimum. (Because of the periodic boundary conditions, all indices should be read modulo X, the size of the grid.) 3. Determine tentative borders between colonies. First, local minima were selected with a density ρ x < T high ; the other minima were discarded. Each of the selected minimal was then associated with a tentative border which would later be further scrutinized. To ensure that no individuals can sit exactly at a border (causing ambiguity as to which colony it belongs to), borders were positioned between grid points. First, the derivative of the density at the position of each minimum was approximated as ρ 0 (x) = (ρ x+1 − ρ x−1 )/(2δx). If a minimum was located at grid point x, then a tentative border was placed between x and x + 1 if the derivative was negative, and between x − 1 and x if the derivative was positive.
4. Assign an ancestor to each tentative colony. The tentative borders implicitly also defined tentative colonies. For each tentative colony at time t 2 an ancestor colony at time t 1 was determined. To do so, we exploited that we have already traced back the ancestry of the individuals in the colonies. We then used the expectation that the ancestor colony P of a colony Q contains most, if not all, ancestors of the individuals that belong to Q. Based on this, we identified P as the ancestor colony that contains the largest fraction of the ancestors of the individuals belonging to Q.
5. Reject tentative borders that reflect fluctuations or incomplete divisions. If the colonies on either side of a tentative border had the same ancestral colony, this suggested that a colony division might have taken place. In this case, we compared the density at the corresponding minimum to the low threshold T low ; if the density was above that threshold, the border was rejected. All other tentative borders were now accepted, so that the identification of colonies at t 2 and their ancestor at time t 1 also became final.
6. Count the number of offspring of each ancestral colony. Now that the ancestors of colonies at time t 2 had been identified, the number of offspring-the absolute fitness-of each ancestral colony could be tabulated. If an ancestral colony had fitness 0, it must have died between t 1 and t 2 . If an ancestral colony had absolute fitness > 1 it must have divided. (In practice, a fitness above 2 did not occur because Δt is too short to support multiple consecutive divisions.) If a colony had fitness 1, the ancestral colony most likely survived the time interval without reproducing.
The thresholds T high and T low are parameters; we found that T high = 0.7K and T low = 0.2K worked well.
Inclusive fitness theory and Hamilton's rule. In S6 Fig we demonstrate that the simulations can also be analyzed using inclusive fitness theory. To do so, we essentially use a formulation originally due to Queller [68]; we briefly describe it in S1 Text Section 2. We applied these calculations separately to every computational time step of the simulation.
The method involves a regression variable � 0 i , which is usually defined as the sum or the mean of the trait values of all social interaction partners of individual i. In our simulation, each individual interact with all other individuals, albeit with an interaction strength that decays with their separation (see Eq 2). We therefore define � 0 i as a weighted sum: With this definition, the theory of S1 Text Section 7 can be applied directly.
Estimating the lattice constant of the hexagonal lattice by counting colonies. The lattice constant of the hexagonal pattern that emerges in the 2D model can be estimated by counting the number of colonies in the habitat. A hexagonal lattice is composed of equilateral triangles with side a and area ffi ffi ffi 3 p a 2 =2. The number of triangles is twice the number of nodes ν. In a large enough habitat of area L 2 , the number of nodes can then be estimated as n � L 2 =ð ffi ffi ffi which, given L = 102.4, corresponds to a � 8.2. This is consistent with the estimate based on the radial distribution function (Fig 2C).

Estimating error bars.
In the Results section and S1 Table we provide 95% confidence intervals for the means of all quantities plotted in Fig 4 and S4 Fig. Because data points in these time series are auto-correlated and the distributions of some quantities are skewed, the standard methods for calculating confidence intervals could not be used. Therefore, we applied the method described in Ref. [69].
Briefly, the idea is to divide the data series into blocks if length l and use the means of these blocks (rather than the original data points) to estimate the standard error of the mean (SEM). Starting with l = 2, if l is increased, the correlations between block means eventually become negligible and the estimates stabilize around a sensible value, which we determined by manual inspection and then rounded off conservatively. Moreover, because of the central limit theorem, the distribution of block means converges to a normal distribution, which justifies the use of t-statistics to estimate confidence intervals. Although the correct number of degrees of freedom to be used is poorly constrained (as it depends on the minimal value of l that is deemed large enough to remove correlations), it is in all cases large enough to ensure that the critical t-value for t 0.05(2) is near 2. We therefore estimated the 95% confidence interval as the sample mean ± twice the estimated SEM.
Software. Simulations were performed with custom software written in Fortran; the code is made available on the following GitHub repository: https://github.com/rutgerhermsen/ altruism.git. Statistics were performed in R version 3.6.1. Visualization was done in R, using ggplot2, and in Wolfram Mathematica 12.

Linear stability analysis
We here provide the details of the linear stability analysis for the 1D habitat that is presented in Fig 3B and 3C and S2 Fig. Mathematical analysis. Consider a population of individuals that each have the same level of altruism ϕ. If the carrying capacity is large, the dynamics if the density ρ(x, t) can be approximated by the following mean-field equation: Here, G a and G rc are the kernel functions used in Eqs 1 and 2 to define the availability of public good and the local density, respectively. The notation f � h stands for the convolution of functions f and h. Eq 7 has a homogeneous equilibrium solution ρ(x, t) = ρ 0 > 0; we ask under what conditions this solution is (linearly) unstable to periodic perturbations so that colonies can form spontaneously.
To find out, we first identify ρ 0 by equating Eq 7 to zero and solving for ρ(x, t) = ρ 0 . Ignoring the trivial solution ρ 0 = 0, the equation is quadratic and can be solved straightforwardly. Out of the two solutions, one is negative and hence irrelevant. The remaining solution depends on all parameters except for the diffusion constant k D .
We then consider a periodic perturbation rðx; tÞ ¼ r 0 þ �ðtÞ sin ð2px=lÞ ð8Þ with a very small (infinitesimal) amplitude �(0) and ask whether �(t) will grow or decay. To obtain a dynamic equation for �(t) we insert Eq 8 into Eq 7. In doing so, we have to work out the convolutions of the Gaussian kernel functions G a and G rc with the sine wave of Eq 8. From the Convolution Theorem it follows that, for any real-valued, normalized, symmetric kernel function f(x) that has a Fourier transformf ðoÞ, the convolution with a sine wave is again a sine wave, but with a reduced amplitude: In the specific case where f(x) is Gaussian with standard deviation σ, we get The convolutions with G a and G rc follow directly from Eqs 9 and 10. We then expand the resulting equation to first order in �(t). Because ρ 0 is the homogeneous solution, the zeroth-order term vanishes. The result is a linear equation of the form: where the factor E(λ) can be written as: |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } altruism À 2d ps m l � � 2 |ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl } |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } resource competition : The solution of Eq 11 is exponential, with growth rate or eigenvalue E(λ). Hence, if Eq 12 is positive for some wavelength λ, perturbations with this wavelength are predicted to grow exponentially. Because demographic noise produces perturbations of any wavelength, this is expected to eventually give rise to periodic density fluctuations with a similar wavelength. Eq 12 provides considerable insight. It consists of three terms, expressing the effects of altruism, motility, and resource competition, as indicated. The three length scales in the system -the scale of altruism σ a , the scale of motility σ m , and the scale of competition σ rc -each appear in their appropriate term.
The first term, describing the effect of altruism, is the only positive one, and it scales with ϕ. This shows that altruism is required to obtain a positive eigenvalue for any wavelength λ. Indeed, altruism tends to amplify density differences: because the benefits of altruism grow with the number of altruists in the local neighborhood, its effect is to increase the reproduction rate in regions of high density, which tends to further increase that density. However, the equation shows that this positive contribution is exponentially suppressed if the wavelength λ is small relative to the scale of altruism σ a ; this is because the kernel function averages out such short waves within the social neighborhoods of individuals.
The second term reflects the effect of motility. Random motion (diffusion) is a homogenizing force and therefore quenches density fluctuations, as reflected in the negative sign of this term. However, because diffusion is famously slow on large length scales, only short wavelengths are strongly affected: if the wavelength λ exceeds the scale of motility (the typical distance traveled by an individual in a generation time) the contribution becomes small.
The third term describes the effect of resource competition. Resource competition reduces the reproduction rate in areas with a larger density and thus suppresses density differences, which explains that its contribution is negative. However, if the wavelength is small relative to the scale of competition σ rc , the density wave averages out within the competitive neighborhood of individuals and the homogenizing effect becomes weak.
Together, this clearly indicates in which regime we ought to expect colonies. Density fluctuations are suppressed by diffusion if their wavelength λ is smaller than σ m , and by resource competition if λ is increased far beyond σ rc . Instabilities are therefore expected only if there is a gap between these two regimes. At the same time, the positive contribution of altruism becomes weak if λ is smaller than σ a . For altruism to be effective in the "gap", σ a therefore should be chosen smaller than σ rc . In summary, instability requires that the diffusion constant is small enough, the scale of competition is large enough, and the scale of altruism is smaller than the scale of competition. Apart from these rules of thumb, Eq 12 can of course be evaluated numerically to make precise predictions; see S2 Fig The simulations were performed as usual and using default parameters except for the following adjustments: 1. All individuals were initialized with a trait value ϕ = 0.05.
2. In these simulations, we were interested in the ecological patterns of a population with fixed ϕ; therefore the mutation rate was set to μ = 0 to disable evolution. c. Calculate the Fourier transform of the KDE and identify the wave number with the largest amplitude.
After the simulation, the mean value of the variance was reported. It is this variance that is plotted in Fig 3C and S2(E) Fig. Also, the mean value of the wave number with the largest amplitude was reported; this wave number was transformed to a wave length, which was plotted in S2(F) Fig. Supporting information S1 Fig. Evolutionary forces and snapshots of replicates (2D habitat). Results are shown based on three simulations that were identical except that the random-number generator was initiated with different random seeds. Fig 2 presents results of Replicate 1. The three figures on top show the mean level of altruism through time (thick colored line). The rise in mean level of altruism can be decomposed into three contributions: natural selection, mutational bias, and random drift, using the method explained in S1 Text Section 3. Plotted are the cumulative contributions of natural selection (black) transmission (red) and genetic drift (green). In all cases, the main contribution is selection, which is consistently positive during the first stretch of the simulations. That said, a mutational bias is revealed as well (red lines). This bias arises because, in this simulation, negative values of ϕ were prohibited (see section Materials and methods) and hence mutations with negative effect are sometimes truncated, especially in individuals with a low trait value.  (Table 1), additionally assuming all individuals have ϕ = 0.05. Perturbations with small wavelength are quenched by motility; those with a long wavelength by resource competition. In between, a window exists (gray shading) of wavelengths that have a positive eigenvalue. This explains the colony formation in the default parameters. The wavelength with the largest eigenvalue (indicated with the red dot and black arrow) provides a prediction for the wavelength-the distance between neighboring colonies. (B) The largest eigenvalue is plotted as a function of the spatial scales in the system: the scale of motility σ m and the scale of competition σ rc . (Remember that the scale of altruism σ a is 1 by definition of the unit of length.) Otherwise, the assumptions are as in panel A. Colony formation is expected only in the linearly unstable regime, to the right of the red contour line. (C) For the same conditions used in panel B, the predicted wavelength is plotted. As a rule of thumb, it is somewhat smaller than 2σ rc . (D) Simulations were performed for the parameters indicated with black symbols in panels B, C, E, and F, assuming that all individuals have an immutable level of altruism ϕ = 0.05. Shown are the resulting radial distribution functions. As expected, the system is nearly homogeneous at σ rc = 1.5 (black circle, in the linearly stable regime), weak pair correlations are seen for σ rc = 3.0 (marginally unstable), and strong correlations emerge for σ rc = 4.5 (far in the unstable regime), indicating colony formation. (E) Simulation were performed for a large number of combinations of the spatial scales (19 × 21 = 399 in total); here, the variance of the local population density is plotted as a simple proxy for colony formation. To demonstrate the reproducibility of these results, this figure shows the same analysis for two additional replicates. The simulations for all three replicates were identical except that the random-number generator was initiated with a different random seed. All replicates show very similar trends. In particular, the marginal distributions of all quantities are highly consistent. Their statistics are summarized in S1 Table. (TIFF) S5 Fig. Separating the effects of reproduction and death to colony-level selection. Two mechanisms contribute to the selection of altruistic colonies: they reproduce more frequently and die less frequently. To evaluate the relative importance of these mechanisms we quantify their contribution to the colony-level selection as defined by the MLS 2 analysis; see S1 Text Section 5 and Section 6 for mathematical details. Shown are the results of a single representative simulation with default parameters. As in Fig 4, the analysis was applied to subsequent time intervals of 80 generations (1 000 computational time steps); the figure plots the total colony-level selection (black) as well as the components due to death (red) and reproduction (blue) events. The marginal histograms on the right-hand side are based on the second half of the simulation, after generation 80 000. Note that the components can be precisely zero if no death and/or reproduction events occur in the particular time interval, resulting in considerable overplotting. First, the results demonstrate that both components to selection are much more likely to be positive than negative (Binomial test, p < 10 −15 in both cases). The component due to death events was positive in 72.6% and negative in just 1.0% of the time steps, with an average value of (3.2 ± 0.4) × 10 −4 . The component due to reproduction events was positive in 62.3% and negative in 8.5% of the time steps, with an average value of (1.6 ± 0.2) × 10 −4 . We conclude that both mechanisms contribute to colony-level selection, but the component due to death is the larger by a factor of two. Here, it is demonstrated that such simulations can also be analyzed from the perspective of inclusive fitness theory. There are multiple ways to do so; here we essentially adopt a formalism based on a partial regression analysis [68]; see section Materials and methods and S1 Text Section 7 for details. Shown are the results of a single representative simulation with default parameters. In all panels, each data point represents an average of the plotted quantity over the preceding 1000 computational time steps (corresponding to 80 generations). (A) The method splits the selection differential S into two parts. The first part, plotted in red, reflects the selection due to the direct fitness effects: the cost of altruism for the actor. This term is consistently negative, which means that altruists face negative selection due to the direct fitness effects, as expected. The second part, plotted in blue, reflects the selection component due to indirect fitness effects: the benefits of altruism for the recipient. That this term is consistently positive indicates that the benefits must preferentially accrue to more altruistic individuals. Both terms approximately cancel out: the sum of both components S (black) is positive on average but much smaller in absolute value than either term separately. (B) The cost appearing in Hamilton's rule, measured as a (partial) regression coefficient (S1 Text Section 7), as a function of time. This cost is consistently positive, confirming that altruism is individually costly. (C) The benefit appearing in Hamilton's rule, demonstrating (as expected) that interacting with altruists yields a benefit.
(D) Relatedness among interaction partners is consistently positive. We note that, in the formulation of inclusive fitness theory used here, relatedness can legitimately become much larger than 1 (see S1 Text Section 7). (TIFF) S1 Table. Statistics of the multilevel selection analysis of Fig 4 and S4 Fig. Error bars for the means are given as 95% confidence intervals (see section Materials and methods). (TIFF) S1 Video. Dynamics of the simulation described in Fig 2. Video depicting the dynamics of the simulation described in Fig 2, which is also shown in S1 Fig as Replicate 1. Default parameters were used ( Table 1) (Table 1). In the video, the left-hand panel shows the positions of all individuals, as in S1 Video, using the same color scale as in Figs 2 and 3A. The righthand panel plots for each position in the habitat the value A, which can be interpreted as the amount of public good at that position, as provided by the altruists in the local environment. The ticks on the left-hand vertical axis show the scale of altruism, the ticks on the right-hand vertical axis the scale of competition. A high-quality version of this video is shared here: https://doi.org/10.5281/zenodo.5727313. (MP4) S1 Text. Evolustionary forces, multilevel selection, and inclusive fitness. In this article, several mathematical results are applied that have been derived long ago. For ease of reference and to facilitate readers who are not intimately familiar with this theory, this text briefly summarizes these results. (PDF) Formal analysis: Rutger Hermsen.